中国邮电高校学报(英文) ›› 2013, Vol. 20 ›› Issue (6): 77-87.doi: 10.1016/S1005-8885(13)60112-0

• Networks • 上一篇    下一篇

MapReduce optimization algorithm based on machine learning in heterogeneous cloud environment

林文辉1,雷振明2,刘军1,杨洁3,刘芳3,何刚1   

  1. 1. Beijing Key Laboratory of Network System Architecture and Convergence, Beijing University of Posts and Telecommunications, Beijing 100876, China 2. School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China 3. Technology Research Institute, Aisino Corporation, Beijing 100195, China 4. School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • 收稿日期:2013-04-15 修回日期:2013-10-10 出版日期:2013-12-31 发布日期:2013-12-27
  • 通讯作者: 林文辉 E-mail:linwh16@gmail.com
  • 基金资助:
    This work was supported by the Important National Science & Technology Specific Projects (2012ZX03002008), the 111 Project of China (B08004), and the Fundamental Research Funds for the Central Universities (2012RC0121).

MapReduce optimization algorithm based on machine learning in heterogeneous cloud environment

  1. 1. Beijing Key Laboratory of Network System Architecture and Convergence, Beijing University of Posts and Telecommunications, Beijing 100876, China 2. School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China 3. Technology Research Institute, Aisino Corporation, Beijing 100195, China 4. School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2013-04-15 Revised:2013-10-10 Online:2013-12-31 Published:2013-12-27
  • Contact: Wenhui Lin E-mail:linwh16@gmail.com
  • Supported by:
    This work was supported by the Important National Science & Technology Specific Projects (2012ZX03002008), the 111 Project of China (B08004), and the Fundamental Research Funds for the Central Universities (2012RC0121).

摘要: We present an approach to optimize the MapReduce architecture, which could make heterogeneous cloud environment more stable and efficient. Fundamentally different from previous methods, our approach introduces the machine learning technique into MapReduce framework, and dynamically improve MapReduce algorithm according to the statistics result of machine learning. There are three main aspects: learning machine performance, reduce task assignment algorithm based on learning result, and speculative execution optimization mechanism. Furthermore, there are two important features in our approach. First, the MapReduce framework can obtain nodes’ performance values in the cluster through machine learning module. And machine learning module will daily calibrate nodes’ performance values to make an accurate assessment of cluster performance. Second, with the optimization of tasks assignment algorithm, we can maximize the performance of heterogeneous clusters. According to our evaluation result, the cluster performance could have 19% improvement in current heterogeneous cloud environment, and the stability of cluster has greatly enhanced.

关键词: cloud computing, MapReduce, machine learning, heterogeneity

Abstract: We present an approach to optimize the MapReduce architecture, which could make heterogeneous cloud environment more stable and efficient. Fundamentally different from previous methods, our approach introduces the machine learning technique into MapReduce framework, and dynamically improve MapReduce algorithm according to the statistics result of machine learning. There are three main aspects: learning machine performance, reduce task assignment algorithm based on learning result, and speculative execution optimization mechanism. Furthermore, there are two important features in our approach. First, the MapReduce framework can obtain nodes’ performance values in the cluster through machine learning module. And machine learning module will daily calibrate nodes’ performance values to make an accurate assessment of cluster performance. Second, with the optimization of tasks assignment algorithm, we can maximize the performance of heterogeneous clusters. According to our evaluation result, the cluster performance could have 19% improvement in current heterogeneous cloud environment, and the stability of cluster has greatly enhanced.

Key words: cloud computing, MapReduce, machine learning, heterogeneity

中图分类号: